163 research outputs found

    The DAWGPAWS pipeline for the annotation of genes and transposable elements in plant genomes

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>High quality annotation of the genes and transposable elements in complex genomes requires a human-curated integration of multiple sources of computational evidence. These evidences include results from a diversity of <it>ab initio </it>prediction programs as well as homology-based searches. Most of these programs operate on a single contiguous sequence at a time, and the results are generated in a diverse array of readable formats that must be translated to a standardized file format. These translated results must then be concatenated into a single source, and then presented in an integrated form for human curation.</p> <p>Results</p> <p>We have designed, implemented, and assessed a Perl-based workflow named DAWGPAWS for the generation of computational results for human curation of the genes and transposable elements in plant genomes. The use of DAWGPAWS was found to accelerate annotation of 80–200 kb wheat DNA inserts in bacterial artificial chromosome (BAC) vectors by approximately twenty-fold and to also significantly improve the quality of the annotation in terms of completeness and accuracy.</p> <p>Conclusion</p> <p>The DAWGPAWS genome annotation pipeline fills an important need in the annotation of plant genomes by generating computational evidences in a high throughput manner, translating these results to a common file format, and facilitating the human curation of these computational results. We have verified the value of DAWGPAWS by using this pipeline to annotate the genes and transposable elements in 220 BAC insertions from the hexaploid wheat genome (<it>Triticum aestivum </it>L.). DAWGPAWS can be applied to annotation efforts in other plant genomes with minor modifications of program-specific configuration files, and the modular design of the workflow facilitates integration into existing pipelines.</p

    Reference Genome Sequence of the Model Plant Setaria

    Get PDF
    We generated a high-quality reference genome sequence for foxtail millet (Setaria italica). The ~400-Mb assembly covers ~80% of the genome and \u3e95% of the gene space. The assembly was anchored to a 992-locus genetic map and was annotated by comparison with \u3e1.3 million expressed sequence tag reads. We produced more than 580 million RNA-Seq reads to facilitate expression analyses. We also sequenced Setaria viridis, the ancestral wild relative of S. italica, and identified regions of differential single-nucleotide polymorphism density, distribution of transposable elements, small RNA content, chromosomal rearrangement and segregation distortion. The genus Setaria includes natural and cultivated species that demonstrate a wide capacity for adaptation. The genetic basis of this adaptation was investigated by comparing five sequenced grass genomes. We also used the diploid Setaria genome to evaluate the ongoing genome assembly of a related polyploid, switchgrass (Panicum virgatum)

    Construction and Homologous Expression of a Maize Adh1

    Full text link

    Discovery and assembly of repeat family pseudomolecules from sparse genomic sequence data using the Assisted Automated Assembler of Repeat Families (AAARF) algorithm

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Higher eukaryotic genomes are typically large, complex and filled with both genes and multiple classes of repetitive DNA. The repetitive DNAs, primarily transposable elements, are a rapidly evolving genome component that can provide the raw material for novel selected functions and also indicate the mechanisms and history of genome evolution in any ancestral lineage. Despite their abundance, universality and significance, studies of genomic repeat content have been largely limited to analyses of the repeats in fully sequenced genomes.</p> <p>Results</p> <p>In order to facilitate a broader range of repeat analyses, the Assisted Automated Assembler of Repeat Families algorithm has been developed. This program, written in PERL and with numerous adjustable parameters, identifies sequence overlaps in small shotgun sequence datasets and walks them out to create long pseudomolecules representing the most abundant repeats in any genome. Testing of this program in maize indicated that it found and assembled all of the major repeats in one or more pseudomolecules, including coverage of the major Long Terminal Repeat retrotransposon families. Both Sanger sequence and 454 datasets were appropriate.</p> <p>Conclusion</p> <p>These results now indicate that hundreds of higher eukaryotic genomes can be efficiently characterized for the nature, abundance and evolution of their major repetitive DNA components.</p

    On the Tetraploid Origin of the Maize Genome

    Get PDF
    Data from cytological and genetic mapping studies suggest that maize arose as a tetraploid. Two previous studies investigating the most likely mode of maize origin arrived at different conclusions. Gaut and Doebley [7] proposed a segmental allotetraploid origin of the maize genome and estimated that the two maize progenitors diverged at 20.5 million years ago (mya). In a similar study, using larger data set, Brendel and colleagues (quoted in [8]) suggested a single genome duplication at 16 mya. One of the key components of such analyses is to examine sequence divergence among strictly orthologous genes. In order to identify such genes, Lai and colleagues [10] sequenced five duplicated chromosomal regions from the maize genome and the orthologous counterparts from the sorghum genome. They also identified the orthologous regions in rice. Using positional information of genetic components, they identified 11 orthologous genes across the two duplicated regions of maize, and the sorghum and rice regions. Swigonova et al. [12] analyzed the 11 orthologues, and showed that all five maize chromosomal regions duplicated at the same time, supporting a tetraploid origin of maize, and that the two maize progenitors diverged from each other at about the same time as each of them diverged from sorghum, about 11.9 mya

    Adaptive Evolution of Signaling Partners

    Get PDF
    Proteins that interact coevolve their structures. When mutation disrupts the interaction, compensation by the partner occurs to restore interaction otherwise counterselection occurs. We show in this study how a destabilizing mutation in one protein is compensated by a stabilizing mutation in its protein partner and their coevolving path. The pathway in this case and likely a general principle of coevolution is that the compensatory change must tolerate both the original and derived structures with equivalence in function and activity. Evolution of the structure of signaling elements in a network is constrained by specific protein pair interactions, by requisite conformational changes, and by catalytic activity. The heterotrimeric G protein-coupled signaling is a paragon of this protein interaction/function complexity and our deep understanding of this pathway in diverse organisms lends itself to evolutionary study. Regulators of G protein Signaling (RGS) proteins accelerate the intrinsic GTP hydrolysis rate of the Gα subunit of the heterotrimeric G protein complex. An important RGS-contact site is a hydroxyl-bearing residue on the switch I region of Gα subunits in animals and most plants, such as Arabidopsis. The exception is the grasses (e.g., rice, maize, sugarcane, millets); these plants have Gα subunits that replaced the critical hydroxyl-bearing threonine with a destabilizing asparagine shown to disrupt interaction between Arabidopsis RGS protein (AtRGS1) and the grass Gα subunit. With one known exception (Setaria italica), grasses do not encode RGS genes. One parsimonious deduction is that the RGS gene was lost in the ancestor to the grasses and then recently acquired horizontally in the lineage S. italica from a nongrass monocot. Like all investigated grasses, S. italica has the Gα subunit with the destabilizing asparagine residue in the protein interface but, unlike other known grass genomes, still encodes an expressed RGS gene, SiRGS1. SiRGS1 accelerates GTP hydrolysis at similar concentration of both Gα subunits containing either the stabilizing (AtGPA1) or destabilizing (RGA1) interface residue. SiRGS1 does not use the hydroxyl-bearing residue on Gα to promote GAP activity and has a larger Gα-interface pocket fitting to the destabilizing Gα. These findings indicate that SiRGS1 adapted to a deleterious mutation on Gα using existing polymorphism in the RGS protein population

    Gene content and distribution in the nuclear genome of Fragaria vesca

    Get PDF
    Thirty fosmids were randomly selected from a library of Fragaria vesca subsp. americana (cv. Pawtuckaway) DNA. These fosmid clones were individually sheared, and ∼4- to 5-kb fragments were subcloned. Subclones on a single 384-well plate were sequenced bidirectionally for each fosmid. Assembly of these data yielded 12 fosmid inserts completely sequenced, 14 inserts as 2 to 3 contiguous sequences (contigs), and 4 inserts with 5 to 9 contigs. In most cases, a single unambiguous contig order and orientation was determined, so no further finishing was required to identify genes and their relative arrangement. One hundred fifty-eight genes were identified in the ∼1.0 Mb of nuclear genomic DNA that was assembled. Because these fosmids were randomly chosen, this allowed prediction of the genetic content of the entire ∼200 Mb F. vesca genome as about 30,500 protein-encoding genes, plus >4700 truncated gene fragments. The genes are mostly arranged in gene-rich regions, to a variable degree intermixed with transposable elements (TEs). The most abundant TEs in F. vesca were found to be long terminal repeat (LTR) retrotransposons, and these comprised about 13% of the DNA analyzed. Over 30 new repeat families were discovered, mostly TEs, and the total TE content of F. vesca is predicted to be at least 16%.EEA BalcarceFil: Pontaroli, Ana Clara. Instituto Nacional de Tecnología Agropecuaria (INTA). Estación Experimental Agropecuaria Balcarce; Argentina. University of Georgia. Department of Genetics; Estados UnidosFil: Rogers, Rebekah L. Harvard University. Department of Organismic and Evolutionary Biology; Estados Unidos. University of Georgia. Department of Genetics; Estados UnidosFil: Qian, Zhang. University of New Hampshire. Department of Biological Sciences; Estados UnidosFil: Shields, Melanie E. University of New Hampshire. Department of Biological Sciences; Estados UnidosFil: Davis, Thomas M. University of New Hampshire. Department of Biological Sciences; Estados UnidosFil: Folta, Kevin M. University of Florida. Horticultural Sciences Department; Estados UnidosFil: SanMiguel, Phillip. Purdue University. Department of Horticulture and Landscape Architecture; Estados UnidosFil: Bennetzen, Jeffrey L. University of Georgia. Department of Genetics; Estados Unido

    Identification of stress-responsive genes in an indica rice (Oryza sativa L.) using ESTs generated from drought-stressed seedlings

    Get PDF
    The impacts of drought on plant growth and development limit cereal crop production worldwide. Rice (Oryza sativa) productivity and production is severely affected due to recurrent droughts in almost all agroecological zones. With the advent of molecular and genomic technologies, emphasis is now placed on understanding the mechanisms of genetic control of the drought-stress response. In order to identify genes associated with water-stress response in rice, ESTs generated from a normalized cDNA library, constructed from drought-stressed leaf tissue of an indica cultivar, Nagina 22 were used. Analysis of 7794 cDNA sequences led to the identification of 5815 rice ESTs. Of these, 334 exhibited no significant sequence homology with any rice ESTs or full-length cDNAs in public databases, indicating that these transcripts are enriched during drought stress. Analysis of these 5815 ESTs led to the identification of 1677 unique sequences. To characterize this drought transcriptome further and to identify candidate genes associated with the drought-stress response, the rice data were compared with those for abiotic stress-induced sequences obtained from expression profiling studies in Arabidopsis, barley, maize, and rice. This comparative analysis identified 589 putative stress-responsive genes (SRGs) that are shared by these diverse plant species. Further, the identified leaf SRGs were compared to expression profiles for a drought-stressed rice panicle library to identify common sequences. Significantly, 125 genes were found to be expressed under drought stress in both tissues. The functional classification of these 125 genes showed that a majority of them are associated with cellular metabolism, signal transduction, and transcriptional regulation

    An examination of targeted gene neighborhoods in strawberry

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Strawberry (<it>Fragaria </it>spp.) is the familiar name of a group of economically important crop plants and wild relatives that also represent an emerging system for the study of gene and genome evolution. Its small stature, rapid seed-to-seed cycle, transformability and miniscule basic genome make strawberry an attractive system to study processes related to plant physiology, development and crop production; yet it lacks substantial genomics-level resources. This report addresses this deficiency by characterizing 0.71 Mbp of gene space from a diploid species (<it>F. vesca</it>). The twenty large genomic tracks (30-52 kb) captured as fosmid inserts comprise gene regions with roles in flowering, disease resistance, and metabolism.</p> <p>Results</p> <p>A detailed description of the studied regions reveals 131 Blastx-supported gene sites and eight additional EST-supported gene sites. Only 15 genes have complete EST coverage, enabling gene modelling, while 76 lack EST support. Instances of microcolinearity with <it>Arabidopsis thaliana </it>were identified in twelve inserts. A relatively high portion (25%) of targeted genes were found in unanticipated tandem duplications. The effectiveness of six FGENESH training models was assessed via comparisons among <it>ab initio </it>predictions and homology-based gene and start/stop codon identifications. Fourteen transposable-element-related sequences and 158 simple sequence repeat loci were delineated.</p> <p>Conclusions</p> <p>This report details the structure and content of targeted regions of the strawberry genome. The data indicate that the strawberry genome is gene-dense, with an average of one protein-encoding gene or pseudogene per 5.9 kb. Current overall EST coverage is sparse. The unexpected gene duplications and their differential patterns of EST support suggest possible subfunctionalization or pseudogenization of these sequences. This report provides a high-resolution depiction of targeted gene neighborhoods that will aid whole-genome sequence assembly, provide valuable tools for plant breeders and advance the understanding of strawberry genome evolution.</p

    Conference Review On the tetraploid origin of the maize genome

    Get PDF
    Abstract Data from cytological and genetic mapping studies suggest that maize arose as a tetraploid. Two previous studies investigating the most likely mode of maize origin arrived at different conclusions. Gaut and Doebley [12] analyzed the 11 orthologues, and showed that all five maize chromosomal regions duplicated at the same time, supporting a tetraploid origin of maize, and that the two maize progenitors diverged from each other at about the same time as each of them diverged from sorghum, about 11.9 mya
    corecore